Communicating Research Findings

PSCI 2270 - Week 13

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

November 14, 2024

Plan for this week



  1. Communicating research findings

  2. Presentations

  3. Let’s make our workhorses with R

  4. Q&A on projects or class

Communicating research findings

The issue with communication


  • There is a lot of information in the world

    • There is even a lot of information in your individual projects… TOO MUCH
  • Your have two tasks

    1. Understand what results you have: Descriptive, observational, experimental
    2. Decide how to present your results: Simplify and tell a story with your data
  • In the end you will have a result from the data AND would want others to understand and believe in it

How we usually communicate

Two main approaches: Tables and Graphs

  • First can be more informative for advanced audience (who understand point estimates and standard errors)
  • Second can be more informative to general audience
  • Sometimes it is not enough to just look at the table! \(\Rightarrow\) Always plot your data!

Anscombe’s Quartet


  • F.J. Anscombe (1973): “…make both calculations and graphs. Both sorts of output should be studied; each will contribute to understanding.”
as_tibble(anscombe)
# A tibble: 11 × 8
      x1    x2    x3    x4    y1    y2    y3    y4
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1    10    10    10     8  8.04  9.14  7.46  6.58
 2     8     8     8     8  6.95  8.14  6.77  5.76
 3    13    13    13     8  7.58  8.74 12.7   7.71
 4     9     9     9     8  8.81  8.77  7.11  8.84
 5    11    11    11     8  8.33  9.26  7.81  8.47
 6    14    14    14     8  9.96  8.1   8.84  7.04
 7     6     6     6     8  7.24  6.13  6.08  5.25
 8     4     4     4    19  4.26  3.1   5.39 12.5 
 9    12    12    12     8 10.8   9.13  8.15  5.56
10     7     7     7     8  4.82  7.26  6.42  7.91
11     5     5     5     8  5.68  4.74  5.73  6.89

Anscombe’s Quartet: Estimates 😎

  • There are four studies. Let’s look at the statistical relationship between \(X\) and \(Y\) for each of them
lm(y1 ~ x1, data = anscombe)

Call:
lm(formula = y1 ~ x1, data = anscombe)

Coefficients:
(Intercept)           x1  
     3.0001       0.5001  
lm(y3 ~ x3, data = anscombe)

Call:
lm(formula = y3 ~ x3, data = anscombe)

Coefficients:
(Intercept)           x3  
     3.0025       0.4997  
lm(y2 ~ x2, data = anscombe)

Call:
lm(formula = y2 ~ x2, data = anscombe)

Coefficients:
(Intercept)           x2  
      3.001        0.500  
lm(y4 ~ x4, data = anscombe)

Call:
lm(formula = y4 ~ x4, data = anscombe)

Coefficients:
(Intercept)           x4  
     3.0017       0.4999  
  • Note: We can also estimate linear correlation by either removing intercept y1 ~ -1 + x1 or by using cor() function

Anscombe’s Quartet: Long format 👌

  • Long format is useful for ggplot2: Stack \(X\)’s and \(Y\)’s for each study on top of each other

    • In tidyverse we also call this long format tidy (which is where the tidy-verse is coming from)
as_tibble(anscombe_tidy)
# A tibble: 44 × 4
   study    id     x     y
   <chr> <int> <dbl> <dbl>
 1 1         1    10  8.04
 2 2         1    10  9.14
 3 3         1    10  7.46
 4 4         1     8  6.58
 5 1         2     8  6.95
 6 2         2     8  8.14
 7 3         2     8  6.77
 8 4         2     8  5.76
 9 1         3    13  7.58
10 2         3    13  8.74
# ℹ 34 more rows
lm(y ~ x, data = anscombe_tidy)

Call:
lm(formula = y ~ x, data = anscombe_tidy)

Coefficients:
(Intercept)            x  
     3.0013       0.4999  

Anscombe’s Quartet: Pooled plot 🙆‍♂️

ggplot(data = anscombe_tidy, 
       mapping = aes(x = x, y = y)) +
  geom_smooth(method = lm, se = FALSE, color = "grey") +
  geom_point() + 
  coord_equal() +
  theme_bw()

Anscombe’s Quartet: Split plot 🤯

ggplot(data = anscombe_tidy, 
       mapping = aes(x = x, y = y, color = study)) +
  geom_smooth(method = lm, se = FALSE, color = "grey") +
  geom_point() +
  coord_equal() +
  facet_wrap(~ study) +
  theme_bw()

Anscombe 2.0 Cairo; Matejka & Fitzmaurice


Corruption and Human Development



  • What general patterns can we get from this plot?

Rigid middle class



  • What general patterns can we get from this plot?

Differences in aspirations in 2021



  • What general patterns can we get from this plot?

King (2006)



All tables and figures should be separately and fully documented. Someone reading only them, without the paper, should be able to understand what is going on. Adding an explanatory paragraph at the bottom of each figure or table is usually necessary to accomplish this. Similarly, someone who reads the paper and ignores the table or figure should also be able to follow it all. The point of the text is to walk the reader by the hand through the table or figure so it is easy to understand. Picking out one number in the table and explaining it in detail at the outset as an example is often a good strategy.

Graphing types


  1. Position along a common scale: Spatial location along a common baseline to represent data

    • Example: Bar chart where the length of each bar represents a certain value, and each bar starts from the same baseline.
  1. Position along non-aligned scales: Position is still used, but the baselines differ

    • Example: Grouped bar chart where each group has a different baseline
  1. Length, direction, angle: Comparisons based on length are fairly accurate. The direction is slightly less so, and the angle even less.

    • Example: A stacked bar chart (length) or a pie chart (angle)

Graphing types


  1. Area: Area is less accurately perceived than the above

    • Example: Circle size in a bubble chart
  1. Volume and curvature: These are difficult for us to evaluate accurately

    • Example: 3D charts that use volume to represent data
  1. Shading, and colour saturation: These are the least accurately perceived

    • Example: Heatmap

Hierarchy by Munzner (2014)


  • How would you test this?

Experimental evidence (!!)

  • Cleveland and McGill (1984): Foundational experiments
  • Heer and Bostock (2010): Using Amazon MTurk crowd-sourcing sample
  • Davis et al. (2022): Account for differences in respondent characteristics
  • Position > angle ? area ? volume \(\Rightarrow\) Position rules!

Common issues with graphs

  • Not enough information (e.g. improper or missing labels)
  • Hard to compare data (e.g. using 3D shapes)
  • Being intransparent (e.g. about scales)
  • Cutting data (e.g. use binary indicators instead of averages)
  • Too cluttered/Too much data
  • More examples here

Which plots we use

  • Workhorses

    • Histograms

    • Scatterplots

    • Time trends

    • Dot-whisker / Box plots

  • …Ponies
  • …Unicorns
Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = gdpPercap)) +
  ggplot2::geom_histogram(bins = 30, fill = "lightgrey", color = "black") +
  ggplot2::labs(title = "Histogram of GDP Per Capita") +
  ggplot2::theme_minimal()

Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = gdpPercap, y = lifeExp, color = continent)) +
  ggplot2::geom_point(alpha = 0.2) +
  ggplot2::stat_smooth(se = FALSE) +
  ggplot2::scale_x_log10() +
  ggplot2::labs(
    title = "Life Expectancy vs GDP Per Capita by Continent",
    x = "log(gdpPercap)") +
  ggplot2::theme_minimal()

Code
gapminder |> 
  dplyr::group_by(continent, year) |> 
  dplyr::summarise(lifeExp = mean(lifeExp, na.rm = TRUE)) |> 
  dplyr::mutate(type = "Continent average") |> 
  dplyr::bind_rows(gapminder) |> 
  dplyr::mutate(type = ifelse(is.na(type), "Country", type),
                country = ifelse(is.na(country), continent, country)) |> 
  ggplot2::ggplot(
    mapping = aes(x = year, y = lifeExp, 
                  group = country, color = type, alpha = type)) +
  ggplot2::geom_line(linewidth = 0.8, show.legend = FALSE) +
  ggplot2::facet_wrap(~continent, ncol = 2) +
  ggplot2::scale_alpha_manual(values = c(1, .1)) +
  ggplot2::labs(
    title = "Trends in Life Expectancy by Continent") + 
  ggplot2::theme_minimal() 

Code
gapminder |> 
  ggplot2::ggplot(
    mapping = aes(x = continent, y = lifeExp)) +
  ggplot2::geom_boxplot(outlier.colour = "hotpink") +
  ggplot2::geom_jitter(position = position_jitter(width = 0.1, height = 0), 
                       alpha = 0.25) +
  ggplot2::labs(title = "Box Plot of Life Expectancy by Continent") +
  ggplot2::theme_minimal()

Presentations after Thanksgiving

Reminders


  • Complete Human Subjects Training provided by CITI

    • You can receive the certificate for that through Vandy for free here (read guide here)
    • You need to complete the basic module and upload it via Brightspace (or through OSF)
  • Final presentations are next week

    • Each of you will have 5-7 minutes for presentation with 3-5 minutes of feedback
    • Sign up for your slot here by Thursday
    • There will be pizza and drinks!
  • Q&A about final projects and class in general on Thursday next week

Final presentations


  • 5-7 minutes \(\approx\) 5-7 slides
  • Do not put too much text on the slides:

    • \(6 \times 6\) rule: Unless absloutely unavoidable don’t put more than 6 words and 6 lines per slide
    • Try not to read from slides
  • What to include: Motivation, Research Question, Hypotheses, Research Design (Context/Unit of analysis/Experiment or Observational), Independent/Dependent variables, Measurement and procedures

Let’s make our workhorses with R

Recap on tidyverse



  • Remember our friend tidyverse package?

    • It works best with so-called tidy datasets
    • It provides full suit of functions that allow us to load (readr and haven), manipulate (dplyr and tidyr packages) and plot (ggplot2) our data
    • It presumes we are using pipelines, %>% or |> to feed left-hand side expression into right-hand side expression
  • We will skip the loading of data today and just learn how to quickly feed tidy data to ggplot2

tidy data



tidy long data



Install and load packages


  • To start off we need to install and load our packages
# this will install packages (you only do this once)

install.packages("tidyverse") # this package loads all other packages we need
install.packages("gapminder") # we will use data from this package
  • Load the packages you installed
# this will LOAD packages (you do this every time you work with it)

library("tidyverse")
library("socviz")
  • Load the data we need (gss_sm)
# this will load the dataset gss_sm from socviz package

data("gss_sm")
  • We can look at the help file for the dataset using ?
# this will open help page for the dataset in RStudio

?gss_sm

Let’s look at the data

  • To look at the data you just need to call the name of dataset
gss_sm
# A tibble: 2,867 × 32
    year    id ballot       age childs sibs   degree race  sex   region income16
   <dbl> <dbl> <labelled> <dbl>  <dbl> <labe> <fct>  <fct> <fct> <fct>  <fct>   
 1  2016     1 1             47      3 2      Bache… White Male  New E… $170000…
 2  2016     2 2             61      0 3      High … White Male  New E… $50000 …
 3  2016     3 3             72      2 3      Bache… White Male  New E… $75000 …
 4  2016     4 1             43      4 3      High … White Fema… New E… $170000…
 5  2016     5 3             55      2 2      Gradu… White Fema… New E… $170000…
 6  2016     6 2             53      2 2      Junio… White Fema… New E… $60000 …
 7  2016     7 1             50      2 2      High … White Male  New E… $170000…
 8  2016     8 3             23      3 6      High … Other Fema… Middl… $30000 …
 9  2016     9 1             45      3 5      High … Black Male  Middl… $60000 …
10  2016    10 3             71      4 1      Junio… White Male  Middl… $60000 …
# ℹ 2,857 more rows
# ℹ 21 more variables: relig <fct>, marital <fct>, padeg <fct>, madeg <fct>,
#   partyid <fct>, polviews <fct>, happy <fct>, partners <fct>, grass <fct>,
#   zodiac <fct>, pres12 <labelled>, wtssall <dbl>, income_rc <fct>,
#   agegrp <fct>, ageq <fct>, siblings <fct>, kids <fct>, religion <fct>,
#   bigregion <fct>, partners_rc <fct>, obama <dbl>
  • We can also print all the names of the variables using names()
names(gss_sm)
 [1] "year"        "id"          "ballot"      "age"         "childs"     
 [6] "sibs"        "degree"      "race"        "sex"         "region"     
[11] "income16"    "relig"       "marital"     "padeg"       "madeg"      
[16] "partyid"     "polviews"    "happy"       "partners"    "grass"      
[21] "zodiac"      "pres12"      "wtssall"     "income_rc"   "agegrp"     
[26] "ageq"        "siblings"    "kids"        "religion"    "bigregion"  
[31] "partners_rc" "obama"      

Let’s look at the data


  • We can also and look at some observations of specific variable using $ and []
gss_sm$region[1:10]
 [1] New England     New England     New England     New England    
 [5] New England     New England     New England     Middle Atlantic
 [9] Middle Atlantic Middle Atlantic
9 Levels: New England Middle Atlantic E. Nor. Central ... Pacific
gss_sm$obama[1:10]
 [1]  0  1  0  0  1  1 NA NA NA  0
  • Or even produce some summaries of our variables using table() or summary()
table(gss_sm$religion, useNA = "ifany")

Protestant   Catholic     Jewish       None      Other       <NA> 
      1371        649         51        619        159         18 
summary(gss_sm$age)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  18.00   34.00   49.00   49.16   62.00   89.00      10 

Recap on tidyverse: Selecting


gss_sm %>%
  select(bigregion, race, age, obama)
# A tibble: 2,867 × 4
   bigregion race    age obama
   <fct>     <fct> <dbl> <dbl>
 1 Northeast White    47     0
 2 Northeast White    61     1
 3 Northeast White    72     0
 4 Northeast White    43     0
 5 Northeast White    55     1
 6 Northeast White    53     1
 7 Northeast White    50    NA
 8 Northeast Other    23    NA
 9 Northeast Black    45    NA
10 Northeast White    71     0
# ℹ 2,857 more rows

Recap on tidyverse: Filtering


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() # this is special case of filter()
# A tibble: 1,728 × 4
   bigregion race    age obama
   <fct>     <fct> <dbl> <dbl>
 1 Northeast White    47     0
 2 Northeast White    61     1
 3 Northeast White    72     0
 4 Northeast White    43     0
 5 Northeast White    55     1
 6 Northeast White    53     1
 7 Northeast White    71     0
 8 Northeast Black    32     1
 9 Northeast Black    60     1
10 Northeast White    76     0
# ℹ 1,718 more rows

Recap on tidyverse: Grouping


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>% 
  group_by(bigregion, race)
# A tibble: 1,728 × 4
# Groups:   bigregion, race [12]
   bigregion race    age obama
   <fct>     <fct> <dbl> <dbl>
 1 Northeast White    47     0
 2 Northeast White    61     1
 3 Northeast White    72     0
 4 Northeast White    43     0
 5 Northeast White    55     1
 6 Northeast White    53     1
 7 Northeast White    71     0
 8 Northeast Black    32     1
 9 Northeast Black    60     1
10 Northeast White    76     0
# ℹ 1,718 more rows

Recap on tidyverse: Summarizing


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(bigregion, race) %>%
  summarize(
    total = n(),
    obama_share = mean(obama),
    obama_sd = sd(obama),
    age_mean = mean(age)
  )
# A tibble: 12 × 6
# Groups:   bigregion [4]
   bigregion race  total obama_share obama_sd age_mean
   <fct>     <fct> <int>       <dbl>    <dbl>    <dbl>
 1 Northeast White   259       0.645    0.480     55.2
 2 Northeast Black    39       0.974    0.160     50  
 3 Northeast Other    15       0.867    0.352     46.4
 4 Midwest   White   360       0.586    0.493     55.4
 5 Midwest   Black    62       1        0         51.2
 6 Midwest   Other    20       0.85     0.366     46.4
 7 South     White   404       0.413    0.493     56.4
 8 South     Black   179       0.944    0.230     47.1
 9 South     Other    26       0.731    0.452     51.0
10 West      White   280       0.539    0.499     54.4
11 West      Black    34       1        0         45.4
12 West      Other    50       0.64     0.485     53.7

Recap on tidyverse: Mutating


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(bigregion, race) %>%
  summarize(
    total = n(),
    obama_share = mean(obama),
    obama_sd = sd(obama),
    age_mean = mean(age)
  ) %>%
  mutate(pct = total/sum(total))
# A tibble: 12 × 7
# Groups:   bigregion [4]
   bigregion race  total obama_share obama_sd age_mean    pct
   <fct>     <fct> <int>       <dbl>    <dbl>    <dbl>  <dbl>
 1 Northeast White   259       0.645    0.480     55.2 0.827 
 2 Northeast Black    39       0.974    0.160     50   0.125 
 3 Northeast Other    15       0.867    0.352     46.4 0.0479
 4 Midwest   White   360       0.586    0.493     55.4 0.814 
 5 Midwest   Black    62       1        0         51.2 0.140 
 6 Midwest   Other    20       0.85     0.366     46.4 0.0452
 7 South     White   404       0.413    0.493     56.4 0.663 
 8 South     Black   179       0.944    0.230     47.1 0.294 
 9 South     Other    26       0.731    0.452     51.0 0.0427
10 West      White   280       0.539    0.499     54.4 0.769 
11 West      Black    34       1        0         45.4 0.0934
12 West      Other    50       0.64     0.485     53.7 0.137 

ggplot2 thinking


https://visualizingsociety.com/

ggplot(data = [DATA], 
       mapping = aes([MAPPINGS])) +                # required
 [GEOM_FUNCTION]()   +                             # required
 [STAT_FUNCTION]()   +                             # not required
 [COORDINATE_FUNCTION]()  +                        # not required
 [SCALE_FUNCTION]()  +                             # not required
 [LABELS_FUNCTION]() +                             # not required
 [THEME_FUNCTION]()                                # not required

Bar chart: Data


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>% # group by race only!
  summarize(total = n()) %>%
  mutate(pct = total/sum(total))
# A tibble: 3 × 3
  race  total    pct
  <fct> <int>  <dbl>
1 White  1303 0.754 
2 Black   314 0.182 
3 Other   111 0.0642

Bar chart: ggplot setup


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>% # group by race only!
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct))

Bar chart: Add bars/columns


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>%
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct)) +
  geom_col()

Bar chart: Add fill colors


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>% # group by race only!
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct, fill = race)) +
  geom_col()

Bar chart: Manage guides


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>%
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct, fill = race)) +
  geom_col() +
  scale_fill_brewer(type = "qual", guide = "none")

Bar chart: Add labels


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(race) %>%
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct, fill = race)) +
  geom_col() +
  scale_fill_brewer(type = "qual", guide = "none") + 
  labs(x = NULL, y = "Percent")

Bar chart: Split plots


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(bigregion, race) %>% # group by race and region!
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct, fill = race)) +
  geom_col() +
  scale_fill_brewer(type = "qual", guide = "none") + 
  labs(x = NULL, y = "Percent") +
  facet_wrap(~ bigregion, nrow = 1)

Bar chart: Add theme


gss_sm %>%
  select(bigregion, race, age, obama) %>%
  drop_na() %>%
  group_by(bigregion, race) %>% # group by race and region!
  summarize(total = n()) %>%
  mutate(pct = total/sum(total)) %>%
  ggplot(mapping = aes(x = race, y = pct, fill = race)) +
  geom_col() +
  scale_fill_brewer(type = "qual", guide = "none") + 
  labs(x = NULL, y = "Percent") +
  facet_wrap(~ bigregion, nrow = 1) +
  theme_minimal()

Scatterplot: Data


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na()
# A tibble: 2,589 × 4
   bigregion sex    income16           age
   <fct>     <fct>  <fct>            <dbl>
 1 Northeast Male   $170000 or over     47
 2 Northeast Male   $50000 to 59999     61
 3 Northeast Male   $75000 to $89999    72
 4 Northeast Female $170000 or over     43
 5 Northeast Female $170000 or over     55
 6 Northeast Female $60000 to 74999     53
 7 Northeast Male   $170000 or over     50
 8 Northeast Female $30000 to 34999     23
 9 Northeast Male   $60000 to 74999     45
10 Northeast Male   $60000 to 74999     71
# ℹ 2,579 more rows

Scatterplot: ggplot setup


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age, 
                       y = as.integer(income16)))

Scatterplot: Add points


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age,
                       y = as.integer(income16))) +
  geom_point(alpha = .2)

Scatterplot: Add trend


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age,
                       y = as.integer(income16))) +
  geom_point(alpha = .2) +
  stat_smooth(se = FALSE)

Scatterplot: Add labels


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age,
                       y = as.integer(income16))) +
  geom_point(alpha = .2) +
  stat_smooth(se = FALSE) +
  labs(x = "Age", y = "Income Category")

Scatterplot: Split plots


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age,
                       y = as.integer(income16))) +
  geom_point(alpha = .2) +
  stat_smooth(se = FALSE) +
  labs(x = "Age", y = "Income Category") +
  # creates two-way grid of plots
  facet_grid(sex ~ bigregion) 

Scatterplot: Add theme


gss_sm %>%
  select(bigregion, sex, income16, age) %>%
  drop_na() %>%
  ggplot(mapping = aes(x = age,
                       y = as.integer(income16))) +
  geom_point(alpha = .2) +
  stat_smooth(se = FALSE) +
  labs(x = "Age", y = "Income Category") +
  facet_grid(sex ~ bigregion) +
  theme_classic()

Dot-whisker plot: Data


gss_sm %>%
  select(bigregion, race, obama) %>%
  # instead of drop_na() we can use filter()
  filter(!is.na(obama)) 
# A tibble: 1,730 × 3
   bigregion race  obama
   <fct>     <fct> <dbl>
 1 Northeast White     0
 2 Northeast White     1
 3 Northeast White     0
 4 Northeast White     0
 5 Northeast White     1
 6 Northeast White     1
 7 Northeast White     0
 8 Northeast Black     1
 9 Northeast Black     1
10 Northeast White     0
# ℹ 1,720 more rows

Dot-whisker plot: ggplot setup


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion))

Dot-whisker plot: Add points


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_point(alpha = .3)

Dot-whisker plot: Jitter points


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3)

Dot-whisker plot: Add data summary


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3) +
  stat_summary(fun.data = "mean_cl_normal",
               color = "black")

Dot-whisker plot: Manage guides


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3) +
  stat_summary(fun.data = "mean_cl_normal",
               color = "black") +
  scale_color_brewer(palette = "Dark2",
                     guide = "none")

Dot-whisker plot: Add labels


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3) +
  stat_summary(fun.data = "mean_cl_normal",
               color = "black") +
  scale_color_brewer(palette = "Dark2",
                     guide = "none") + 
  labs(x = "Voted for Obama", y = NULL,
       title = "Obama vote shares by region")

Dot-whisker plot: Split plot


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3) +
  stat_summary(fun.data = "mean_cl_normal",
               color = "black") +
  scale_color_brewer(palette = "Dark2",
                     guide = "none") + 
  labs(x = "Voted for Obama", y = NULL,
       title = "Obama vote shares by region and race") +
  facet_wrap(~ race, ncol = 1)

Dot-whisker plot: Add theme


gss_sm %>%
  select(bigregion, race, obama) %>%
  filter(!is.na(obama)) %>%
  ggplot(mapping = aes(x = obama,
                       y = bigregion,
                       color = bigregion)) +
  geom_jitter(width = .05, alpha = .3) +
  stat_summary(fun.data = "mean_cl_normal",
               color = "black") +
  scale_color_brewer(palette = "Dark2",
                     guide = "none") + 
  labs(x = "Voted for Obama", y = NULL,
       title = "Obama vote shares by region and race") +
  facet_wrap(~ race, ncol = 1) +
  theme_bw()

Resources



References

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. https://doi.org/10.1080/01621459.1984.10478080.
Davis, Russell, Xiaoying Pu, Yiren Ding, Brian D. Hall, Karen Bonilla, Mi Feng, Matthew Kay, and Lane Harrison. 2022. “The Risks of Ranking: Revisiting Graphical Perception to Model Individual Differences in Visualization Performance.” IEEE Transactions on Visualization and Computer Graphics, 1–16. https://doi.org/10.1109/TVCG.2022.3226463.
Heer, Jeffrey, and Michael Bostock. 2010. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 203–12.
King, Gary. 2006. “Publication, Publication.” PS: Political Science & Politics 39 (1): 119–25.
Munzner, Tamara. 2014. Visualization Analysis and Design. CRC press.